Computation of Normalized Edit Distance and Applications

نویسندگان

  • Andrés Marzal
  • Enrique Vidal
چکیده

Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d( X , Y ) is defined as the minimum of W ( P ) / L ( P ) , where P is an editing path between X and Y , W ( P ) is the sum of the weights of the elementary edit operations of P, and L ( P ) is the number of these operations (length of P). In this paper, it is shown that in general, d ( X , Y ) cannot be computed by first obtaining the conventional (unnormalized) edit distance between X and Y and then normalizing this value by the length of the corresponding editing path. In order to compute normalized edit distances, a new algorithm that can be implemented to work in O(m .n’) time and O(n2) memory space is proposed, where m and n are the lengths of the strings under consideration, and m 2 n. Experiments in hand-writtem digit recognition are presented, revealing that the normalized edit distance consistently provides better results than both unnormalized or post-normalized classical edit distances. Index TemEditing, Levenshtein distance, normalized edit distance, optical character recognition, pattern recognition, speech recognition, spelling correction, string correction,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Eecient Uniform-cost Normalized Edit Distance Algorithm

A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of edit operations which are of three types: insertion, deletion, and substitution of symbols. The model assumes a given weight function which assigns a non-negative real cost to each of these edit operations. The amortized weight for a given ...

متن کامل

Efficient Algorithms For Normalized Edit Distance

ÖMER EGECIOGLU2, Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. E-mail: [email protected] ABSTRACT: A common model for computing the similarity of two stringsX and Y of lengthsm and n respectively, withm n, is to transformX into Y through a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, a...

متن کامل

An Efficient Uniform-Cost Normalized Edit Distance Algorithm

A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of i...

متن کامل

On the computation of edit distance functions

The edit distance between two graphs on the same labeled vertex set is the symmetric difference of the edge sets. The edit distance function of hereditary property, H, is a function of p ∈ [0, 1] and is the limit of the maximum normalized distance between a graph of density p and H. This paper uses localization, for computing the edit distance function of various hereditary properties. For any ...

متن کامل

Parallel algorithms for fast computation of normalized edit distances

We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem. The normalized edit distance between two strings X and Y with lengths n m is the minimum quotient of the sum of the costs of edit operations transforming X into Y by the length of the edit path corresponding to those edit operations. Marzal and Vidal proposed a sequential algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 1993